Cross-corpus Readability Compatibility Assessment for English Texts

نویسندگان

چکیده

Text readability assessment has gained significant attention from researchers in various domains. However, the lack of exploration into corpus compatibility poses a challenge as different research groups utilize corpora. In this study, we propose novel evaluation framework, Cross-corpus text Readability Compatibility Assessment (CRCA) a , to address issue. The framework encompasses three key components: (1) Corpus: CEFR, CLEC, CLOTH, NES, OSP, andRACE. Linguistic features, GloVeword vector representations, and their fusion features were extracted. (2) Classification models: Machine learning methods (XGBoost, SVM) deep (BiLSTM, Attention-BiLSTM) employed. (3) metrics: RJSD, RRNSS, NDCG metrics. Our findings revealed: Validated compatibility, with OSP standing out significantly other datasets. An adaptation effect among corpora, feature classification methods. Consistent outcomes across metrics, validating robustness framework. study offer valuable insights selection, representation, methods, it can also serve beginning effort for cross-corpus transfer learning.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Readability Assessment of Translated Texts

In this paper we investigate how readability varies between texts originally written in English and texts translated into English. For quantification, we analyze several factors that are relevant in assessing readability – shallow, lexical and morpho-syntactic features – and we employ the widely used Flesch-Kincaid formula to measure the variation of the readability level between original Engli...

متن کامل

Sorting Texts by Readability

This article presents a novel approach for readability assessment through sorting. A comparator that judges the relative readability between two texts is generated through machine learning, and a given set of texts is sorted by this comparator. Our proposal is advantageous because it solves the problem of a lack of training data, because the construction of the comparator only requires training...

متن کامل

A multivariate model for classifying texts' readability

We report on results from using the multivariate readability model SVIT to classify texts into various levels. We investigate how the language features integrated in the SVIT model can be transformed to values on known criteria like vocabulary, grammatical fluency and propositional knowledge. Such text criteria, sensitive to content, readability and genre in combination with the profile of a st...

متن کامل

Measuring Readability for Japanese Learners of English

This paper describes the relative effectiveness of seven variables of three categories in predicting the readability of the EFL texts used in the Japanese context. The factors examined in our research were (1) word difficulty and (2) idiom difficulty, in addition to the commonly used variables, sentence length (SL) and word length (WL). In the analysis of word difficulty three measures were con...

متن کامل

Measuring Readability of Polish Texts: Baseline Experiments

Measuring readability of a text is the first sensible step to its simplification. In this paper we present an overview of the most common approaches to automatic measuring of readability. Of the described ones, we implemented and evaluated: Gunning FOG index, Flesch-based Pisarek method. We also present two other approaches. The first one is based on measuring distributional lexical similarity ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2023

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2023.3315834